University of Texas at Austin

Upcoming Event: PhD Dissertation Defense

Fast Algorithms for Minimally Supervised Learning

Youguang Chen, Ph.D Candidate

12 – 3PM
Monday Apr 13, 2026

POB 6.304

Abstract

Supervised learning has achieved remarkable success across scientific domains; however, in many practical settings labeled data are scarce, expensive, or infeasible to obtain. This dissertation develops fast, theoretically grounded algorithms for minimally supervised learning, with emphasis on statistical guarantees, spectral approximation, and scalable computation in high-dimensional regimes.

The first part studies density-based clustering. We establish a precise relationship between DBSCAN and minimum spanning trees and analyze a scalable approximation, kNN-DBSCAN, which replaces range queries with k-nearest-neighbor graphs. We characterize conditions under which the two formulations are equivalent and provide complexity analysis for a hybrid MPI/OpenMP implementation. The resulting method achieves provable scalability while preserving clustering structure in high dimensions.

The second part addresses representative sample selection via optimal experimental design. Building on regret minimization and online convex optimization, we derive near-optimal guarantees for A-, D-, E-, V-, and G-optimal objectives. Using matrix inequalities and spectral arguments, including applications of the Golden–Thompson inequality, we establish performance bounds for entropy-regularized and 4ω1/2-regularized relaxations. These results yield computationally tractable algorithms with approximation guarantees for large-scale design problems.

The third part develops Fisher Information Ratio Active Learning (FIRAL), a scalable algorithm for pool-based active learning in multinomial logistic regression. Under sub-Gaussian assumptions, we derive non-asymptotic upper and lower bounds relating the Fisher Information Ratio to excess risk, and prove near-optimal performance guarantees for the proposed relaxation and rounding procedures. To address computational bottlenecks, we introduce Approx-FIRAL, which exploits structured Hessian approximations and preconditioned conjugate gradient methods to reduce storage and computational complexity while preserving statistical guarantees.

The final part investigates Bayesian inverse problems with approximate forward operators. We propose novel independent Metropolis–Hastings schemes—Latent-IMH and Proximal-IMH—that correct operator approximation error through spectral alignment and Gauss–Newton-type transformations. We quantify the impact of operator discrepancies via Jacobian log-determinant diagnostics and expected Kullback–Leibler divergence, and establish conditions under which the resulting samplers retain accuracy while significantly reducing forward and inverse PDE solves.

Across clustering, optimal design, active learning, and Bayesian inference, this dissertation advances a unified principle: combining spectral approximation theory, regret analysis, and scalable numerical linear algebra enables minimally supervised learning algorithms with provable statistical guarantees and high computational efficiency.

 

Fast Algorithms for Minimally Supervised Learning

Event information

Date
12 – 3PM
Monday Apr 13, 2026
Location POB 6.304
Hosted by George Biros